A Reproducible Data Analysis Workflow
نویسندگان
چکیده
In this tutorial, we describe a workflow to ensure long-term reproducibility of R-based data analyses. The leverages established tools and practices from software engineering. It combines the benefits various open-source including R Markdown, Git, Make, Docker, whose interplay ensures seamless integration version management, dynamic report generation conforming journal styles, full cross-platform computational reproducibility. meeting primary goals that 1) reporting statistical results is consistent with actual (dynamic generation), 2) analysis exactly reproduces at later point in time even if computing platform or changed (computational reproducibility), 3) changes any (during development post-publication) are tracked, tagged, documented while earlier versions both code remain accessible. While research community increasingly recognizes document management as reproducibility, demonstrate practical examples these alone not sufficient Combining containerization, dependence generation, proposed increases scientific productivity by facilitating reuse data.
منابع مشابه
Reproducible Research Workflow in R for the Analysis of Personalized Human Microbiome Data
This article presents a reproducible research workflow for amplicon-based microbiome studies in personalized medicine created using Bioconductor packages and the knitr markdown interface.We show that sometimes a multiplicity of choices and lack of consistent documentation at each stage of the sequential processing pipeline used for the analysis of microbiome data can lead to spurious results. W...
متن کاملServer-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems
Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing andpostprocessing routines canbechallenging for anumberof reasons including (1) accessingandpreprocessing the large volumeand variety ofdata requiredby themodel, (2) postprocessing largedata collec...
متن کاملEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Container technologies such as Docker [1] are transforming the way distributed systems are deployed onto cloud platforms by providing a simple mechanism for packaging and isolating an application and its dependencies from the host machine on which it is running. The same ideas and technologies can be applied to computational science applications to obtain exceptional ease of installation and re...
متن کاملA tool for reproducible research : From data analysis ( in
Software Much scientific research makes use of commonly available 'office' software. While numerous more fully-featured open-source alternatives exist, the integration of diverse tools and platforms which their use often entails can be challenging. The package for Emacs aims to bring together a number of mp these elements with the goal of simplifying the process of converting an .R file, as use...
متن کاملTowards a scientific blockchain framework for reproducible data analysis
Publishing reproducible analyses is a long-standing and widespread challenge [1] for the scientific community, funding bodies and publishers [2, 3, 4]. Although a definitive solution is still elusive [5], the problem is recognized to affect all disciplines [6, 7, 8] and lead to a critical system inefficiency [9]. Here, we propose a blockchain-based approach to enhance scientific reproducibility...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Quantitative and computational methods in behavioral sciences
سال: 2021
ISSN: ['2699-8432']
DOI: https://doi.org/10.5964/qcmb.3763